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MAXIMUM LQ-LIKELIHOOD ESTIMATION 

By Davide Ferrari and Yuhong Yang 1 

Universita di Modena e Reggio Emilia and University of Minnesota 

In this paper, the maximum Lq- likelihood estimator (MLqE), a 
new parameter estimator based on nonextensive entropy [Kibernetika 
3 (1967) 30-35] is introduced. The properties of the MLqE are stud- 
ied via asymptotic analysis and computer simulations. The behavior 
of the MLgE is characterized by the degree of distortion q applied to 
the assumed model. When q is properly chosen for small and moder- 
ate sample sizes, the MLgE can successfully trade bias for precision, 
resulting in a substantial reduction of the mean squared error. When 
the sample size is large and q tends to 1, a necessary and sufficient 
condition to ensure a proper asymptotic normality and efficiency of 
MLgE is established. 

1. Introduction. One of the major contributions to scientific thought of 
the last century is information theory founded by Claude Shannon in the 
late 1940s. Its triumph is highlighted by countless applications in various 
scientific domains including statistics. The central quantity in information 
theory is a measure of the "amount of uncertainty" inherent in a probability 
distribution (usually called Shannon's entropy). Provided a probability den- 
sity function p{x) for a random variable X, Shannon's entropy is defined as 
~H(X) = —E\logp(X)]. The quantity — logp(x) is interpreted as the informa- 
tion content of the outcome x, and T-L(X) represents the average uncertainty 
removed after the actual outcome of X is revealed. The connection between 
logarithmic (or additive) entropies and inference has been copiously studied 
(see, e.g., Cover and Thomas [9]). Akaike [3] introduced a principle of sta- 
tistical model building based on minimization of entropy. In a parametric 
setting, he pointed out that the usual inferential task of maximizing the 
log-likelihood function can be equivalently regarded as minimization of the 
empirical version of Shannon's entropy, — Y17=i l°gp(^?)- Rissanen proposed 



Received November 2007; revised January 2009. 
Supported in part by NSF Grant DMS-07-06850. 

A MS 2000 subject classifications. Primary 62F99; secondary 60F05, 94A17, 62G32. 
Key words and phrases. Maximum Lg-likelihood estimation, nonextensive entropy, 
asymptotic efficiency, exponential family, tail probability estimation. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics, 

2010, Vol. 38, No. 2, 753-783. This reprint differs from the original in pagination 

and typographic detail. 



1 



2 



D. FERRARI AND Y. YANG 



the well-known minimum description length criterion for model comparison 
(see, e.g., Barron, Rissanen and Yu [5]). 

Since the introduction of Shannon's entropy, other and more general mea- 
sures of information have been developed. Renyi [27] and Aczel and Daroczy 
[2] in the mid-1960s and 1970s proposed generalized notions of information 
(usually referred to as Renyi entropies) by keeping the additivity of inde- 
pendent information, but using a more general definition of mean. In a dif- 
ferent direction, Havrda and Charvat [16] proposed nonextensive entropies, 
sometimes referred to as g-order entropies, where the usual definition of 
mean is maintained while the logarithm is replaced by the more general 
function L q (u) = {u l ~ q — 1)/(1 — q) for q > 0. In particular, when q — > 1, 
L q (u) — > log(-u), recovering the usual Shannon's entropy. 

In recent years, g-order entropies have been of considerable interest in 
different domains of application. Tsallis and colleagues have successfully ex- 
ploited them in physics (see, e.g., [29] and [30]). In thermodynamics, the 
g-entropy functional is usually minimized subject to some properly chosen 
constraints, according to the formalism proposed by Jaynes [19] and [20]. 
There is a large literature on analyzing various loss functions as the convex 
dual of entropy minimization, subject to constraints. From this standpoint, 
the classical maximum entropy estimation and maximum likelihood are seen 
as convex duals of each other (see, e.g., Altun and Smola [4]). Since Tsallis' 
seminal paper [29], (/-order entropy has encountered an increasing wave of 
success and Tsallis' nonextensive thermodynamics, based on such informa- 
tion measure, is nowadays considered the most viable candidate for gen- 
eralizing the ideas of the famous Boltzmann-Gibbs theory. More recently, 
a number of applications based on the ^-entropy have appeared in other 
disciplines such as finance, biomedical sciences, environmental sciences and 
linguistics [14]. 

Despite the broad success, so far little effort has been made to address the 
inferential implications of using nonextensive entropies from a statistical per- 
spective. In this paper, we study a new class of parametric estimators based 
on the ^-entropy function, the maximum Lg-likelihood estimator (MLgE). 
In our approach, the role of the observations is modified by slightly changing 
the model of reference by means of the distortion parameter q. From this 
standpoint, Lg-likelihood estimation can be regarded as the minimization 
of the discrepancy between a distribution in a family and one that mod- 
ifies the true distribution to diminish (or emphasize) the role of extreme 
observations. 

In this framework, we provide theoretical insights concerning the statisti- 
cal usage of the generalized entropy function. In particular, we highlight the 
role of the distortion parameter q and give the conditions that guarantee 
asymptotic efficiency of the MLgE. Further, the new methodology is shown 
to be very useful when estimating high-dimensional parameters and small 
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tail probabilities. This aspect is important in many applications where we 
must deal with the fact that the number of observations available is not large 
in relation to the number of parameters or the probability of occurrence of 
the event of interest. Standard large sample theory guarantees that the max- 
imum likelihood estimator (MLE) is asymptotically efficient, meaning that 
when the sample size is large, the MLE is at least as accurate as any other 
estimator. However, for a moderate or small sample size, it turns out that 
the MLqE can offer a dramatic improvement in terms of mean squared error 
at the expense of a slightly increased bias, as will be seen in our numerical 
results. 

For finite sample performance of MLgE, not only the size of q n — 1 but 
also its sign (i.e., the direction of distortion) is important. It turns out that 
for different families or different parametric functions of the same family, 
the beneficial direction of distortion can be different. In addition, for some 
parameters, MLf/E does not produce any improvement. We have found that 
an asymptotic variance expression of the MLqE is very helpful to decide the 
direction of distortion for applications. 

The paper is organized as follows. In Section 2, we examine some information- 
theoretical quantities and introduce the MLgE; in Section 3, we present its 
basic asymptotic properties for exponential families. In particular, a neces- 
sary and sufficient condition on the choice of q in terms of the sample size to 
ensure a proper asymptotic normality and efficiency is established. A gen- 
eralization that goes out of the exponential family is presented in Section 4. 
In Section 5, we consider the plug- in approach for tail probability estima- 
tion based on MLgE. The asymptotic properties of the plug-in estimator are 
derived and its efficiency is compared to the traditional MLE. In Section 6, 
we discuss the choice of the distortion parameter q. In Section 7, we present 
Monte Carlo simulations and examine the behavior of MLqE in finite sample 
situations. In Section 8, concluding remarks are given. Technical proofs of 
the theorems are deferred to Appendix A. 

2. Generalized entropy and the maximum Lq-likelihood estimator. Con- 
sider a cr-finite measure /j on a measurable space (O, <^). The Kullback- 
Leibler (KL) divergence [21, 22] (or relative entropy) between two density 
functions g and / with respect to \jl is 

(2.1) V(f\\g) = Ef \og^ = J^ f(x) log M d pL(x). 

Note that finding the density g that minimizes T>{f\\g) is equivalent to min- 
imizing Shannon's entropy [28] H(f,g) = —Eflogg(X). 

Definition 2.1. Let / and g be two density functions. The q-entropy 
of g with respect to / is defined as 

(2-2) n q (f,g) = -E f L q {g(X)}, q>0, 
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where L q {u) = logu if q = 1 and L q (u) = (u 1 q — 1)/(1 — q), otherwise. 

The function L q represents a Box-Cox transformation in statistics and 
in other contexts it is often called a deformed logarithm. Note that if q — > 
1, then L q (u) — > log (it) and the usual definition of Shannon's entropy is 
recovered. 

Let ^# = {f(x;9),9 S 0} be a family of parametrized density functions 
and suppose that the true density of observations, denoted by f(x;6o), is a 
member of Assume further that jtft is closed under the transformation 

< 2 - 3 > ^ w r>a 

The transformed density f(x;9)^ is often referred to as zooming or escort 
distribution [1, 7, 26] and the parameter r provides a tool to accentuate 
different regions of the untransformed true density f(x;9). In particular, 
when r < 1, regions with density values close to zero are accentuated, while 
for r > 1, regions with density values further from zero are emphasized. 
Consider the following KL divergence between f(x;9) and f(x;9o)^: 

(2.4) V r (0 Q \\0)= [ /(x;g )^log /(x;g ° )(r) ^(x). 

Jn f{x;0) 

Let 9* be the value such that f(x;9*) = f(x;9o)^ and assume that differ- 
entiation can be passed under the integral sign. Then, clearly 9* minimizes 

Z> r (0o||0) over °- Let °** be the value such that f( x ^ **) = /0r;#o) (1/9) , 
q > 0. Since we have V 5)^^(6*0, Q)\e** =0 and VlH q (9o, 9)\q** is positive def- 
inite, H q (9o,9) has a minimum at 9**. 

The derivations above show the minimizer of T> t {9q\\9) over 9 is the same 
as the minimizer of H. t {9q,9) over 9 when q = 1/r. Clearly, by considering 
the divergence with respect to a distorted version of the true density we 
introduce a certain amount of bias. Nevertheless, the bias can be properly 
controlled by an adequate choice of the distortion parameter q, and later we 
shall discuss the benefits gained from paying such a price for parameter esti- 
mation. The next definition introduces the estimator based on the empirical 
version of the q-entropy. 

Definition 2.2. Let Xi,...,X n be an i.i.d. sample from f(x;9 ), 9 G 
0. The maximum Lg-likelihood estimator (MLgE) of #0 is defined as 

n 

(2.5) 9 n = argmaxJ2L q [f(Xi;9)}, q>0. 
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When q — > 1, if the estimator 9 n exists, then it approaches the maximum 
likelihood estimate of the parameters, which maximizes ^^og f(Xi] 9). In 
this sense, the MLgE extends the classic method, resulting in a general in- 
ferential procedure that inherits most of the desirable features of traditional 
maximum likelihood, and at the same time can improve over MLE due to 
variance reduction, as will be seen. 

Define 

U(x;9) = V e log{f(x;9)}, 

(2.6) 

U*(X;9,q) = U(X;6)f(X;9) 1 -i. 
In general, the estimating equations have the form 

(2.7) e, q ) = o. 

i=l 

Equation (2.7) offers a natural interpretation of the MLgE as a solution to 
a weighted likelihood. When q ^ 1, (2.7) provides a relative-to-the-model 
re-weighting. Observations that disagree with the model receive low or high 
weight depending on q < 1 or q > 1. In the case q = l, all the observations 
receive the same weight. 

The strategy of setting weights that are proportional to a power trans- 
formation of the assumed density has some connections with the methods 
proposed by Windham [33], Basu et al. [6] and Choi, Hall and Presnell [8]. 
In these approaches, however, the main objective is robust estimation and 
the weights are set based on a fixed constant not depending on the sample 
size. 

Example 2.1. The simple but illuminating case of an exponential dis- 
tribution will be used as a recurrent example in the course of the pa- 
per. Consider an i.i.d. sample of size n from a distribution with density 
Ao exp {— xAo}, x > and Ao > 0. In this case, the L^-likelihood equation is 

(2.8) e- [X * A - logA1(1 - 9) (-Xi + = 0. 

With 5 = 1, the usual maximum likelihood estimator is A = (Xa^/ 71 )" 1 = 
X 1 . However, when q^l, (2.8) can be rewritten as 

YJi=l X i w i( X ii\<l) V 1 
Td =1 Wi(Xi,X,q) J 

where Wi := e~^ XiX ~ 1 ° sX ^ 1 ~ q \ When q < 1, the role played by observations 
corresponding to higher density values are accentuated; when q > 1, obser- 
vations corresponding to density values close to zero are accentuated. 
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3. Asymptotics of the MLgE for exponential families. In this section, 
we present the asymptotic properties of the new estimator when the degree 
of distortion is chosen according to the sample size. In the remainder of the 
paper, we focus on exponential families, although some generalization results 
are presented in Section 4. In particular, we consider density functions of 
the form 

(3.1) f(x;9) = exp{9 T b(x)-A(9)}, 

where 9 £ B C M. p is a real valued natural parameter vector, b(x) is the vector 
of functions with elements bj(x) (j = 1, . . . ,p) and A{9) = log f Q e e b ^ x > dji(x) 
is the cumulant generating function (or log normalizer). For simplicity in 
presentation, the family is assumed to be of full rank (but similar results 
hold for curved exponential families). The true parameter will be denoted 
by O . 

3.1. Consistency. Consider the value such that 

(3.2) E eo U*(X;9* n ,q n ) = 0. 

It can be easily shown that 9* n = 9§jq n . Since the actual target of 9 n is 0*, 
to retrieve asymptotic unbiasedness of 9 n , q n must converge to 1. We call 
9* n the surrogate parameter of 9q. We impose the following conditions: 

A.l q n > is a sequence such that q n — > 1 as n — > oo. 

A. 2 The parameter space is compact and the parameter 9q is an interior 
point in 0. 

In similar contexts, the compactness condition on is used for technical 
reasons (see, e.g., Wang, van Eeden and Zidek [32]), as is the case here. 

Theorem 3.1. Under assumptions A.l and A.2, with probability go- 
ing to 1, the L q -likelihood equation yields a unique solution 9 n that is the 

~ p 

maximizer of the L q -likelihood function in 0. Furthermore, we have n — >■ #o- 

Remark. When is compact, the MLgE always exists under our con- 
ditions, although it is not necessarily unique with probability one. 

3.2. Asymptotic normality. 

Theorem 3.2. // assumptions A.l and A.2 hold, then we have 

(3.3) ^V- x / 2 (8 n -8* n ) 4iV p (0,I p ) asn^oo, 
where I p is the (p x p) identity matrix, V n = J" 1 K n J~ l and 

(3.4) K n = Eg [U* (X; 9* n , q n )] T [U* (X; 9* n ,q n )}, 

(3.5) J n = E eo [V e U*(X;9* n ,q n )}. 
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A necessary and sufficient condition for asymptotic normality of MLqE 
around 6q is \/n(q n — 1) — > 0. 

Let m(9) := V e A(9) and D(9) := V 2 e A(0). Note that K n and J n can be 
expressed as 

(3.6) K n = c 2 ,n(D(0 2 ,n) + [m(9 2 , n ) - m{6* n )) [m(0 2 ,n) - m{9* n )) 1 ) 
and 

Jn = ci,„(l - q n )D(9 hn ) - c 1>n D{0* n ) 

(3.7) 

+ ci,„(l - q n )[m(e ljn ) - m(9* n )][m(9 hn ) - m(9* n )] T , 

where c k , n = exp{A(9 nik ) - A(9 Q )} and 9 k , n = k9 (l/q„ -l) + 9 . When q n -)■ 
1, it is seen that V n — > —D(9q), the asymptotic variance of the MLE. When 
6CM 1 we use the notation a\ for the asymptotic variance in place of V n . 
Note that the existence of moments are ensured by the functional form of 
the exponential families (e.g., see [23]). 

Remarks, (i) When q is fixed, the MLqE is a regular M-estimator [18], 
which converges in probability to 9* = 9$/q. (ii) With the explicit expression 
of #* , one may consider correcting the bias of MLgE by using the estimator 
q n 9 n . The numerical results are not promising in this direction under correct 
model specification. 



Example 3.1 (Exponential distribution). The surrogate parameter is 
(9* = Xo/q n and a lengthy but straightforward calculation shows that the 
asymptotic variance of the MLgE of Ao is 

2 r 



(3.8) 



2g n + 2 



^(2-9n) 3 
oo. By Theorem 3.2, we conclude that n 1 / 2 



\ 2 



as n — > oo. tsy ineorem d.i, we conclude tnat n*' "o~ n l {\ n — Xo/qn) converges 
weakly to a standard normal distribution as n — > oo . Clearly, the asymptotic 
calculation does not produce any advantage of MLgE in terms of reducing 
the limiting variance. However, for an interval of q n , we have a 2 t < Aq (see 
Section 6) and, based on our simulations, an improvement of the accuracy 



is achieved in finite sample sizes as long as < q n — 1 



on 



1//2 ), which 



ensures a proper asymptotic normality of X n . For the re-scaled estimator 
q n X n , the expression q^o^ is larger than 1 unless q=l, which suggests that 
q n X n may be at best no better than X n . 



Example 3.2 (Multivariate normal distribution). Consider a multivari- 
ate normal family with mean vector \x and covariance matrix Two con- 
venient matrix operators in this setting are the vec(-) (vector) and vech(-) 
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(vector-half). Namely, vec : W xp i— > M. rp stacks the columns of the argu- 
ment matrix. For symmetric matrices, vech : § pxp i— y ]Rp(p+ 1 )/ 2 stacks only 
the unique part of each column that lies on or below the diagonal [25] . Fur- 
ther, for a symmetric matrix M, define the extension matrix G as vecM = 
GvechM. Thus, 9$ = (/x T , vech T 5]) T and under such a parametrization, it is 
easy to show the surrogate parameter solving (3.2) is 0* = (/^ T , y^vech 1 X!) T , 
where interestingly the mean component does not depend on q n . In fact, for 
symmetric distributions about the mean, it can be shown that the distor- 
tion imposed to the model affects the spread of the distribution but leaves 
the mean unchanged. Consequently, the MLqE is expected to influence the 
estimation of S without much effect on fi. This will be clearly seen in our 
simulation results (see Section 7.3). The calculation in Appendix B shows 
that the asymptotic variance of the MLqE of 9q is the block-diagonal matrix 

/ (2-?) 2+P n \ 



(3.9) V n 



(3-2q) 1 +p/2" 

V[(3-2g) 2 + l](2 







[(2-g) 2 + l] 2 (3-2g)2+P/2 
V x [G^E- 1 ®^- 1 )^]- 1 J 

where ® denotes the Kronecker product. 

4. A generalization. In this section, we relax the restriction of expo- 
nential family and present consistency and asymptotic normality results for 
MLqE under some regularity conditions. 

Theorem 4.1. Let q n be a sequence such that q n — > 1 as n — > oo and 
assume the following: 

B.l 9q is an interior point in O. 

B.2 E doS n Veee \\U{X ] 9)\\ 2 < oo and E 6o sup e60 [/(X; Of - l] 2 as 5 -> 
0. 

B.3 S wp eee \\lY^=iU{X i -e)-E e JJ{X-e)\\^^ asn^oo, 

where || • || denotes the £2-norm. Then, with probability going to 1, the 

L q -likelihood equation yields a unique solution 9 n that maximizes the L q - 

~ p 

likelihood. Furthermore, we have 9 u ^-9q. 

Remark 4.2. (i) Although for a large n the L g -likelihood equation has 
a unique zero with a high probability, for finite samples there may be roots 
that are actually bad estimates, (ii) The uniform convergence in condition 
B.3 is satisfied if the set of functions {U(x,9) :9 € 0} is Glivenko-Cantelli 
under the true parameter 9o (see, e.g., [31], Chapter 19.2). In particular, it 
suffices to require (i) U(x;9) is continuous in 9 for every x and dominated 
by an integrable function and (ii) compactness of 0. 
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For each € 0, define a symmetric pxp matrix I* (x; 0, q) = VgU* (x; 0, q), 
where U* represents the modified score function as in (2.6) and let the 
matrices K n , J n and V n be as defined in the previous section. 

Theorem 4.3. Let q n be a sequence such that q n — > 1 and 9* n — > 9q as 
n — > oo, where 0* is the solution of EU*(X; 0*, q n ) = 0. Suppose U*(x;9,q) 
is twice differentiable in 9 for every x and assume the following: 

C.l maxi<fc< p £ , o |J7^(X,0*,g n )| 3 , k = 1, . . . ,p, is upper bounded by a con- 
stant. 

C.2 The smallest eigenvalue of K n is bounded away from zero. 

C.3 Eq {I*(X, 0*, q n )}\i, k, I = 1, . . . ,p, are upper bounded by a constant. 

C.4 The second- order partial derivatives ofU*(x,9,q n ) are dominated by an 
integrable function with respect to the true distribution of X for all 
in a neighborhood of 9q and q n in a neighborhood of 1 . 

Then, 

(4.1) ^V- 1/2 (9 n -9* n ) %N p (0,I) asn^oo. 

5. Estimation of the tail probability. In this section, we address the 
problem of tail probability estimation, using the popular plug-in procedure, 
where the point estimate of the unknown parameter is substituted into the 
parametric function of interest. We focus on a one-dimensional case, that is, 
p = l, and derive the asymptotic distribution of the plug-in estimator for the 
tail probability based on the MLq method. For an application of the MLgE 
proposed in this work on financial risk estimation, see Ferrari and Paterlini 
[12]. 

Let a(x;6) = Pg(X < x) or a(x;9) = 1 — P$(X < x), depending on whether 
we are considering the lower tail or the upper tail of the distribution. With- 
out loss of generality, we focus on the latter from now on, and assume 
a(x;9) > for all x [of course a(x;9) — > as x — > oo]. When x is fixed, 
under some conditions, the familiar delta method shows that an asymptot- 
ically normally distributed and efficient estimator of makes the plug-in 
estimator of a(x;9) also asymptotically normal and efficient. However, in 
most applications a large sample size is demanded in order for this asymp- 
totic behavior to be accurate for a small tail probability. As a consequence, 
the setup with x fixed but n — > oo presents an overly optimistic view, as it 
ignores the possible difficulty due to smallness of the tail probability in rela- 
tion to the sample size n. Instead, allowing x to increase in n (so that the tail 
probability to be estimated becomes smaller as the sample size increases) 
more realistically addresses the problem. 
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5.1. Asymptotic normality of the plug-in MLq estimator. We are inter- 
ested in estimating a(x n ;9o), where x n — > oo as n— > oo. For 9* E O and 
8 > 0, define 



(5.1) /3(x;9*;5) = sup 

0een[0*-5/^/n,£'*+<Vv /: ra] 

and 7(2;; 0) = a"(x;9)/a'(x;9). 



a"(x;6) 



a"(x;9*) 



Theorem 5.1. Let 0* be as in the previous section. Under assumptions 
A.l and A. 2, if n" 1 / 2 ^^; 9*)\f3(x n ; 0*; 5) -)• /or eac/i 5 > 0, i/ien 

^- o(xnjn) - a(a; ra ; 0*) ^ ^ i 
a n a'(x n ;6*) 

w herea n = -[E 0o U*(X;9* n ) 2 } 1 / 2 /E eo [dU*(X;9,q n )/d9\e^}. 

Remarks, (i) For the main requirement of the theorem on the order 
of the sequence x n , it is easiest to be verified on a case by case basis. For 
instance, in the case of the exponential distribution in (A. 4), for x n > 0, 

p ^nA~,2 

P{x n -\* n -5)= SUp _ x n < SUp e Xn\\-\ n \ =e Sxn/V^. 

AeA*±5/v^ e Xn nX n \e\*±5/^/Ti 

Moreover, 7(x n ;A*) = — x n . So, the condition reads n~ 1 / 2 x n e fa "/ v/ ™ —> 0, 
that is, n~ l / 2 x n — > 0. (ii) The plug-in estimator based on c[n9n has been 
examined as well. With q n — > 1, we did not find any significant advantage. 

The condition n~ l l 2 \^{x n ; 0*)|/3(x n ; 0* ; 5) — > 0, to some degree, describes 
the interplay between the sample size n, x n and q n for the asymptotic nor- 
mality to hold. When x n — > 00 too fast so as to violate the condition, the 
asymptotic normality is not guaranteed, which indicates the extreme diffi- 
culty in estimating a tiny tail probability. In the next section, we will use 
this framework to compare the MLgE of the tail probability, a(x n ; n ), with 
the one based on the traditional MLE, a(x n ;9 n ). 

In many applications, the quantity of interest is quantile instead of the 
tail probability. In our setting, the quantile function is defined as p(s;9) = 
a _1 (s;0), < s < 1 and 6 0. Next, we present the analogue of Theorem 
5.1 for the plug-in estimator of the quantile. Define 



(5.2) Pi(s;0*;6)= sup 

6>een[r-<5/v^*+< 5 /v / ™] 

and ll {s-9)=p"( S -9)/p'(s-9). 



p"{s-9) 



p'{s-9*) 



5>0, 
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Theorem 5.2. Let < s n < 1 be a nonincreasing sequence such that 
s n \ as n — > oo and let Q* n and q n be as in Theorem 5.1. Under assumptions 
A.l and A. 2, for a sequence s n such that n~ l l 2 \^i(s n ] #*)|/3i(s n ; 5) — > 
/or each 5 > 0, we have 

o- n p'{s n ;0l) 

5.2. Relative efficiency between MLE and MLqE. In Section 3, we showed 
that when (q n — l)^/n— > 0, the MLoE is asymptotically as efficient as the 
MLE. For tail probability estimation, with x n — > oo, it is unclear if the MLgE 
performs efficiently. 

Consider w n and v n , two estimators of a parametric function g n {6) such 
that both y/n(w n — a n )/a n and \/n(v n — b n )/T n converge weakly to a stan- 
dard normal distribution as n — > oo for some deterministic sequences a n , b n , 
o n > and r n > 0. 

Definition 5.1. Define 

{b n -g n {9)) 2 + Tl/n 



(5.3) A(w n ,v n ):-- 



{a n - 9n{0)) 2 + crl/ 'n 



The bias adjusted asymptotic relative efficiency of w n with respect to v n is 
lim n _> 00 A(w n , v n ), provided that the limit exists. 

It can be easily verified that the definition does not depend on the specific 
choice of a n , b n , a n and r n among equivalent expressions. 

Corollary 5.3. Under the conditions of Theorem 5.1, when q n is cho- 
sen such that 

(5.4) n 1/2 a(x n ; 9* n )a(x n \ 9 )~ l 1 and a'(x n ; 6^)a'(x n ; Oo)" 1 1, 
then A(a(x n ;9 n ),a(x n ;6 n )) = 1. 

The result, which follows directly from Theorem 5.1 and Definition 5.1, 
says that when q n is chosen sufficiently close to 1, asymptotically speaking, 
the MLgE is as efficient as the MLE. 

Example 5.1 (Continued). In this case, we have a(x n ;X) = e~ Xx " and 
a'(x n ; A) = — x n e~ Xx ". For sequences x n and q n such that x n /y/ri — > and 
(q n — \)y/n^ 0, we have that 

(5-5) y/K^- J 4iV(0,l). 
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When q n = 1 for all n, we recover the usual plug-in estimator based on MLE. 
With the asymptotic expressions given above, 

(5.6) K{a{x-\ n ),a{x- ) \ n )) = ^ I {e- Xn{Xo,qn - Xo) - l) 2 + e~ 2x ^ Xa ^- x °\ 

which is greater than 1 when q n > 1. Thus, no advantage in terms of MSE 
is expected by considering q n > 1 (which introduces bias and enlarges the 
variance at the same time). 

Although in limits MLgE is not more efficient than MLE, MLgE can be 
much better than MLE due to variance reduction as will be clearly seen in 
Section 7. The following calculation provides a heuristic understanding. Let 
r n = 1 — l/q n . Add and subtract 1 in (5.6), obtaining 

nr 2 L 1/o ( e - x " A( >) 2 _ . 

(5.7) " V9 " 2 + r n L 1/qn {z n °) + 1 < nr l + r n 2x n \ + 1, 

where the last inequality holds as L 1 / qn (u) < log(u) for any u > and q < 1. 
Next, we impose (5.7) to be smaller than 1 and solve for q n , obtaining 

(5.8) Tb := + < ft, < 1. 

This provides some insights on the choice of the sequence q n in accordance 
to the size of the probability to be estimated. If q n approaches 1 too quickly 
from below, the gain obtained in terms of variance vanishes rapidly as n 
becomes larger. On the other hand, if q n converges to 1 too slowly, the 
bias dominates the variance and the MLE outperforms the MLgE. This 
understanding is confirmed in our simulation study. 

6. On the choice of q. For the exponential distribution example, we have 
observed the following: 

1. For estimating the natural parameter, when q n — >■ 1, the asymptotic vari- 
ance of MLqE is equivalent to that of MLE in limit, but can be smaller. 
For instance, in the variance expression (3.8) one can easily check that 
(q 2 -2q + 2)/[q 5 (2-q) 3 } < 1 for 1 < q < 1.40; thus, choosing the distortion 
parameter in such a range gives o\ < Aq. 

2. For estimating the tail probability, when q n — > 1, the asymptotic variance 
of MLgE can be of a smaller order than that of MLE, although there is 
a bias that approaches 0. In particular: 

(i) MLgE cannot be asymptotically more efficient than MLE. 

(ii) MLgE is asymptotically as efficient as MLE when q n is chosen to 
be close enough to 1. In the case of tail probability for the exponential 
distribution, it suffices to choose q n such that (q n — l)x n — > 0. 
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3. One approach to choosing q is to minimize an estimated asymptotic mean 
squared error of the estimator when it is mathematically tractable. In the 
case of the exponential distribution, by Theorem 5.1 we have the following 
expression for the asymptotic mean squared error: 



96(0,1) 

where A is the MLE. This will be also used in some of our simulation 
studies. 

In general, unlike in the above example, closed-form expressions of the 
asymptotic mean squared error are not available, which calls for more work 
on this issue. In the literature on applications of nonextensive entropy, al- 
though some discussions on choosing q have been made often from physical 
considerations, it is unclear how to do it from a statistical perspective. In 
particular, the direction of distortion (i.e., q > 1 or q < 1) needs to be de- 
cided. We offer the following observations and thoughts: 

1. For estimating the parameters in an exponential family, although | q n — 
l|n -1 / 2 guarantees the right asymptotic normality (i.e., asymptotic nor- 
mality centered around Oq), one direction of distortion typically reduces 
the variance of estimation and consequently improves the MSE. In the ex- 
ponential distribution case, q n needs to be slightly greater than 1, but for 
estimating the covariance matrix for multivariate normal observations, 
based on the asymptotic variance formula in Example 3.2, q n needs to be 
slightly smaller than 1. For a given family, the expression of the asymp- 
totic covariance matrix for the MLqE given in Section 3 can be used to 
find the beneficial direction of distortion. Our numerical investigations 
confirm this understanding. 

2. To minimize the mean squared error for tail probability estimation for 
the exponential distribution family, we need < q n < 1. This choice is in 
the opposite direction for estimating the parameter A itself. Thus, the 
optimal choice of q n is not a characteristic of the family alone but also 
depends on the parametric function to be estimated. 

3. For some parametric functions, the MLgE makes little change. For the 
multivariate normal family, the surrogate value of the mean parameter 
stays exactly the same while the variance parameters are altered. 

4. We have found empirically that given the right distortion direction, choices 
of q n with 1 1 — q n \ between 1/n and 1 / y/n usually improves — to different 
extents — over the MLE. 




MSE(g, A ) = ( e - Xo/qXn - e - XoX ") 2 



(6.2) 




71 
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7. Monte Carlo results. In this section, the performance of the MLgE 
in finite samples is explored via simulations. Our study includes (i) an as- 
sessment of the accuracy for tail probability estimation and reliability of 
confidence intervals and (ii) an assessment of the performance of MLqE 
for estimating multidimensional parameters, including regression settings 
with generalized linear models. The standard MLE is used as a benchmark 
throughout the study. 

In this section, we present both deterministic and data-driven approaches 
on choosing q n . First, deterministic choices are used to explore the possible 
advantage of the MLgE for tail probability estimation with q n approaching 
1 fast when x is fixed and q n approaching 1 slowly when x increases with 
n. Then, the data-driven choice in Section 6 is applied. For multivariate 
normal and GLM families, where estimation of the MSE or prediction error 
becomes analytically cumbersome, we choose q n = 1 — 1/n, which satisfies 
1 — q n = o(n~ 1 / 2 ) that is needed for asymptotic normality around 6q. I n ah 
considered cases, numerical solution of (2.7) is found using variable metric 
algorithm (e.g., see Broyden [15]), where the ML solution is chosen as the 
starting value. 

7.1. Mean squared error: role of the distortion parameter q. In the first 
group of simulations, we compare the estimators of the true tail probability 
a = a(x;\o), obtained via the MLq method and the traditional maximum 
likelihood approach. Particularly, we are interested in assessing the relative 
performance of the two estimators for different choices of the sample size by 
taking the ratio between the two mean squared errors, MSE(a n )/MSE(5 n ). 
The simulations are structured as follows: (i) For any given sample size n > 2, 
a number B = 10,000 of Monte Carlo samples X\, . . . ,X n is generated from 
an exponential distribution with parameter Ao = 1. (ii) For each sample, 
the MLq and ML estimates of a, respectively, a n ^ = a(x; \ n ,k) and a n ^ = 
a(x; \ n ,k), k = 1, . . . , B, are obtained, (hi) For each sample size n, the relative 
performance between the two estimators is evaluated by the ratio R n = 
MSEMc(«n)/ MSEMc(5 n ), where MSEmc denotes the Monte Carlo estimate 
of the mean squared error. In addition, let y 1 = B^ 1 J2k=i(^n,k ~ ot) 2 and 
y 2 = B -1 J2k=i(®n,k — a) 2 - By the central limit theorem, for large values 
of B, y = (y 1,2/2)' approximately has a bi-variate normal distribution with 
mean (MSE(d n ), MSE(5 n ))' and a certain covariance matrix V. Thus, the 
standard error for R n can be computed by the delta method [11] as 



where 711, 722 and 712 denote, respectively, the Monte Carlo estimates for 
the components of the covariance matrix T. 
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Case 1: fixed a and q. Figure 1 illustrates the behavior of R n for several 
choices of the sample size. In general, we observe that for relatively small 
sample sizes, R n > 1 and the MLgE clearly outperforms the traditional MLE. 
Such a behavior is much more accentuated for smaller values of the tail 
probability to be estimated. In contrast, when the sample size is larger, 
the bias component plays an increasingly relevant role and eventually we 
observe that R n < 1. This case is presented in Figure 1(a) for values of the 
true tail probability a = 0.01, 0.005, 0.003 and a fixed distortion parameter 
q = 0.5. Moreover, the results presented in Figure 1(b) show that smaller 
values of the distortion parameter q accentuate the benefits attainable in a 
small sample situation. 

Case 2: fixed a and q n 1. In the second experimental setting, illustrated 
in Figure 2(a), the tail probability a is fixed, while we let q n be a sequence 
such that q n /* 1 and < q n < 1. For illustrative purposes we choose the 
sequence q n = [1/2 + e 0-3(n-20)]^ 1 + e o.3(n-20)] 5 n > 2) and study R n for 

different choices of the true tail probability to be estimated. For small values 
of the sample size, the chosen sequence q n converges relatively slowly to 
1 and the distortion parameter produces benefits in terms of variance. In 
contrast, when the sample size becomes larger, q n adjusts quickly to one. 
As a consequence, for large samples the MLgE exhibits the same behavior 
shown by the traditional MLE. 

Case 3: a n \ and q n /*■ 1. The last experimental setting of this subsec- 
tion examines the case where both the true tail probability and the distortion 



Hi 
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(a) " (b) 

Fig. 1. Monte Carlo mean squared error ratio computed from B = 10,000 samples 
of size n. In (a) we use a fixed distortion parameter q = 0.5 and true tail probability 
a — 0.01, 0.005, 0.003. The dashed lines represent 99% confidence bands. In (b) we set 
a = 0.003 and use qi = 0.65, qi = 0.85 and qs — 0.95. The dashed lines represent 90% 
confidence bands. 
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Fig. 2. (a) Monte Carlo mean squared error ratio computed from B = 10,000 samples of 
size n, for different values of the true probability: oti — 0.01, «2 = 0.005 and ay = 0.003. 
The distortion parameter is computed as q n = [1/2 + e°- 3(n_20) ]/[l + e°- 3(n ~ 20) ] . (b) Monte 
Carlo mean squared error ratio computed from B — 10,000 samples of size n. We use 
sequences q n = 1 - [10 log (n + 10)] -1 and x n = n 1/(2+s) (Si = 0.5, <5 2 = 1.0 and S s = 1.5). 
The dashed lines represent 99% confidence bands. 



parameter change depending on the sample size. We consider sequences of 
distortion parameters converging slowly relative to the sequence of quantiles 
x n . In particular we set q n = 1 — [10 log (n + 10)] _1 and x n = n l /^ 2Jr5 \ In the 
simulation described in Figure 2(b), we illustrate the behavior of the esti- 
mator for (5 = 0.5,1.0 and 1.5, confirming the theoretical findings discussed 
in Section 5. 

7.2. Asymptotic and bootstrap confidence intervals. The main objective 
of the simulations presented in this subsection is twofold: (a) to study the 
reliability of MLqE based confidence intervals constructed using three com- 
monly used methods: asymptotic normality, parametric bootstraps and non- 
parametric bootstraps; (b) to compare the results with those obtained using 
MLE. The structure of simulations is similar to that of Section 7.1, but a 
data-driven choice of q n is used, (i) For each sample, first we compute A n , the 
MLE of Ao- We substitute A n in (6.1) and solve it numerically in order to ob- 
tain q* as described there, (ii) For each sample, the Mhq and ML estimates 
of the tail probability a are obtained. The standard errors of the estimates 
are computed using three different methods: the asymptotic formula derived 
in (5.5), nonparametric bootstrap and parametric bootstrap. The number 
of replicates employed in bootstrap re-sampling is 500. We construct 95% 
bootstrap confidence intervals based on the bootstrap quantiles and check 
the coverage of the true value a. 
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Table 1 

MC means and standard deviations of estimators of a, along with the MC mean of the 

standard error computed using: (i) asymptotic normality, (ii) bootstrap and (iii) 
parametric bootstrap. The true tail probability is a — 0.01 and q = 1 corresponds to the 

MLE 



n 


q* 


Estimate 


St. dev. 




SGboot 


SGpboot 


15 


0.939 


0.009489 


0.010975 


0.010472 


0.011923 


0.010241 




1.000 


0.013464 


0.014830 


0.013313 


0.013672 


0.015090 


25 


0.959 


0.009693 


0.008417 


0.008470 


0.009134 


0.008298 




1.000 


0.012108 


0.010517 


0.009919 


0.010227 


0.010950 


50 


0.977 


0.010108 


0.006261 


0.006326 


0.006575 


0.006249 




1.000 


0.011385 


0.007354 


0.006894 


0.007083 


0.007318 


100 


0.988 


0.010158 


0.004480 


0.004568 


0.004680 


0.004549 




1.000 


0.010789 


0.004908 


0.004778 


0.004880 


0.004943 


500 


0.998 


0.010006 


0.002014 


0.002052 


0.002061 


0.002050 




1.000 


0.010122 


0.002055 


0.002070 


0.002073 


0.002087 



In Table 1, we show the Monte Carlo means of a n and a n , their stan- 
dard deviations and the standard errors computed with the three methods 
described above. In addition, we report the Monte Carlo average of the esti- 
mates of optimal distortion parameter q*. When q* = 1, the results refer to 
the MLE case. Not surprisingly, q* approaches 1 as the sample size increases. 
When the sample size is small, the MLqE has a smaller standard deviation 
and better performance. When n is larger, the advantage of MLgE dimin- 
ishes. As far as the standard errors are concerned, the asymptotic method 
and the parametric bootstrap seem to provide values somewhat closer to the 
Monte Carlo standard deviation for the considered sample sizes. 

In Table 2, we compare the accuracy of 95% confidence intervals and 
report the relative length of the intervals for MLgE over those for MLE. 
Although the coverage probability for MLgE is slightly smaller than that of 
MLE (in the order of 1%), we observe a substantial reduction in the interval 
length for all of the considered cases. The most evident benefits occur when 
the sample size is small. Furthermore, in general, the intervals computed 
via parametric bootstrap outperform the other two methods in terms of 
coverage and length. 

7.3. Multivariate normal distribution. In this subsection, we evaluate 
the MLq methodology for estimating the mean and covariance matrix of 
a multivariate normal distribution. We generate B = 10,000 samples from a 
multivariate normal N p (fi,T,), where (i is the p-dimensional unknown mean 
vector and £ is the unknown (p x p) covariance matrix. In our simulation, 
the true mean is \i = and the r/th element of £ is where — 1 < p < 1. 

To gauge performance for the mean we employed the usual L2-norm. For 
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Table 2 

MC coverage rate of 95% confidence intervals for a, computed using (i) asymptotic 
normality, (ii) boostrap and (iii) parametric bootstrap. RL is the length of the intervals of 
MLqE over that of MLE. The true tail probability is a = 0.01 and q = l corresponds to 

the MLE 



n 


q* 


Asympt. 




Boot. 




Par. boot. 




Coverage (%) 


RL 


Coverage (%) 


RL 


Coverage (%) 


RL 


15 


0.939 


79.2 


0.787 


89.1 


0.865 


92.9 


0.657 




1.000 


80.9 




88.4 




92.5 




25 


0.958 


83.4 


0.854 


91.8 


0.890 


93.6 


0.733 




1.000 


84.3 




90.8 




94.2 




50 


0.977 


87.1 


0.918 


92.3 


0.928 


93.9 


0.824 




1.000 


88.4 




91.6 




93.4 




100 


0.988 


91.1 


0.956 


93.3 


0.960 


94.7 


0.889 




1.000 


92.2 




92.9 




94.3 




500 


0.998 


94.5 


0.991 


95.0 


0.995 


95.2 


0.962 




1.000 


94.7 




94.6 




94.8 





the covariance matrix, we considered the loss function 
(7.1) A(£,S g ) = tr(5r%-I) 2 , 

where E g represents the MLq estimate of £ with q = 1 — 1/n. Note that 
the loss is when S = S g and is positive otherwise. Moreover, the loss is 
invariant to the transformations ^4£A T and A'EqA 1 for a nonsingular matrix 
A. The use of such a loss function is common in literature (e.g., Huang et 
al. [17]). 

In Table 3, we show simulation results for moderate or small sample sizes 
ranging from 10 to 100 for various dimensions of the covariance matrix E. 
The entries in the table represent the Monte Carlo mean of A(S,Si) over 
that of A(S,Sq), where Si is the usual ML estimate multiplied by the 
correction factor n/(n — 1). The standard error of the ratio is computed 
via the delta method. Clearly, the MLqE performs well for smaller sample 
sizes. Interestingly, the squared error for the MLqE reduces dramatically 
compared to that of the MLE as the dimension increases. Remarkably, when 
p = 8 the gain in accuracy persists even for larger sample sizes, ranging 
from about 22% to 84%. We tried various structures of £ and obtained 
performances comparable to the ones presented. For \i we found that MLqE 
performs nearly identically to MLE for all choices of p and n, which is not 
surprising given the findings in Section 3. For brevity we omit the results 
on fi. 

7.4. Generalized linear models. Our methodology can be promptly ex- 
tended to the popular framework of the generalized linear models. Con- 
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Table 3 



Monte Carlo mean o/A(E,Ei) 


over that of A(E, E«j) 


with standard 


error in parenthesis 


n 




V 






1 


2 


4 


8 


10 


1.225 (0.018) 


1.298 (0.019) 


1.740 (0.029) 


1.804 (0.022) 


15 


1.147 (0.014) 


1.249 (0.017) 


1.506 (0.021) 


1.840 (0.026) 


25 


1.083 (0.011) 


1.153 (0.012) 


1.313 (0.016) 


1.562 (0.020) 


50 


1.041 (0.007) 


1.052 (0.007) 


1.199 (0.011) 


1.377 (0.015) 


100 


1.018 (0.005) 


1.033 (0.005) 


1.051 (0.006) 


1.222 (0.011) 



sider the regression setting where each outcome of the dependent variables, 
Y, is drawn from a distribution in the exponential family. The mean ii of 
the distribution is assumed to depend on the independent variables, X, 
through E(Y\X) = 7] = g~ 1 (X J /3), where X is the design matrix, /3 is a p- 
dimensional vector of unknown parameters and g is the link function. In our 
simulations, we consider two notable instances: (i) Y from an exponential 
distribution with 7] = exp(— x T /3); (ii) Y from a Bernoulli distribution with 
rj = 1/(1 + exp{x T /3}). The first case represents the exponential regression 
model, which is a basic setup for time-to-event analysis. The latter is the 
popular logistic regression model. 

We initialize the simulations by generating design points randomly drawn 
from the unit hypercube [—1, l] p . The entries of the true vector of coefficients 
/3 are assigned by sampling p points at random in the interval [—1, 1], obtain- 
ing values p = (-0.57, 0.94, 0.16, -0.72, 0.68, 0.92, 0.80, 0.04, 0.64, 0.34, 0.38, 0.47). 
The values of X and f3 are kept fixed during the simulations. Then, 1000 
Monte Carlo samples of Y\X are generated according to the two models 
described above and for each sample MLq and ML estimates are computed. 
The prediction error based on independent out-of-sample observations is 

1 io 3 

(7.2) PE g = —^(Y^-g- 1 (Xf%, n )) 2 , 

where f3 qyTl is the MLgE of (5. In Table 4 we present the prediction error 
for various choices of n and p. For both models, the MLgE outperforms 
the classic MLE for all considered cases. The benefits from ML^E can be 
remarkable when the dimension of the parameter space is larger. This is 
particularly evident in the case of the exponential regression, where the 
prediction error of MLE is at least twice that of MLqE. In one case, when 
n = 25 and p = 12, the MLgE is about nine times more accurate. This is 
mainly due to MLgE's stabilization of the variance component, which for 
the MLE tends to become large quickly when n is very small compared to p. 



20 



D. FERRARI AND Y. YANG 



Table 4 

Monte Carlo mean of PEi over that of PE q for exponential and logistic regression 
with standard error in parenthesis 



n 



p 


25 


50 


100 


250 






Exp. regression 




2 


2.549 (0.003) 


2.410 (0.002) 


2.500 (0.003) 


2.534 (0.003) 


4 


2.469 (0.002) 


2.392 (0.002) 


2.543 (0.002) 


2.493 (0.002) 


8 


4.262 (0.012) 


2.941 (0.004) 


3.547 (0.006) 


3.582 (0.006) 


12 


9.295 (0.120) 


3.644 (0.008) 


3.322 (0.005) 


5.259 (0.027) 






Logistic rej 


p-ession 




2 


1.156 (0.006) 


1.329 (0.006) 


1.205 (0.003) 


1.385 (0.003) 


4 


1.484 (0.022) 


1.141 (0.003) 


1.502 (0.007) 


1.353 (0.003) 


8 


1.178 (0.008) 


1.132 (0.003) 


1.290 (0.004) 


1.300 (0.002) 


12 


1.086 (0.005) 


1.141 (0.003) 


1.227 (0.003) 


1.329 (0.002) 



Although for the logistic regression we observe a similar behavior, the gain 
in high dimension becomes more evident for larger n. 

8. Concluding remarks. In this work, we have introduced the MLgE, 
a new parametric estimator inspired by a class of generalized information 
measures that have been successfully used in several scientific disciplines. 
The MLgE may also be viewed as a natural extension of the classical MLE. 
It can preserve the large sample properties of the MLE, while — by means 
of a distortion parameter q — allowing modification of the trade-off between 
bias and variance in small or moderate sample situations. The Monte Carlo 
simulations support that when the sample size is small or moderate, the 
MLgE can successfully trade bias for variance, obtaining a reduction of the 
mean squared error, sometimes very dramatically. 

Overall, this work makes a significant contribution to parametric esti- 
mation and applications of nonextensive entropies. For parametric models, 
MLE is by far the most commonly used estimator and the substantial im- 
provement as seen in our numerical work seems relevant and important to 
applications. Given the increasing attention to (/-entropy in other closely 
related disciplines, our theoretical results provide a useful view from a sta- 
tistical perspective. For instance, from the literature, although q is chosen 
from interesting physical considerations, for statistical estimation (e.g., for 
financial data analysis where g-entropy is considered) , there are few clues as 
to how to choose the direction and amount of distortion. 

Besides the theoretical optimality results and often remarkably improved 
performance over MLE, our proposed method is very practical in terms of 
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implement ability and computational efficiency. The estimating equations are 
simply obtained by replacing the logarithm of log-likelihood function in the 
usual maximum likelihood procedure by the distorted logarithm. Thus, the 
resulting optimization task can be easily formulated in terms of a weighted 
version of the familiar score function, with weights proportional to the (1 — 
q)th power of the assumed density. Hence, similarly to other techniques 
based on re- weighing of the likelihood, simple and fast algorithms for solving 
the MLq equations numerically (possibly even for large problems) can be 
derived. 

For the MLq estimators, helpful insights on their behaviors may be gained 
from robust analysis. For a given q, (2.6) defines an M-estimator of the 
surrogate parameter 9*. It seems that global robustness properties, such as 
a high breakdown point, may be established for a properly chosen distortion 
parameter, which would add value to the MLq methodology. 

High-dimensional estimation has recently become a central theme in statis- 
tics. The results in this work suggest that the MLq methodology may be 
a valuable tool for some high-dimensional estimation problems (such as 
gamma regression and covariance matrix estimation as demonstrated in this 
paper) as a powerful remedy to the MLE. We believe this is an interesting 
direction for further exploration. 

Finally, more research on the practical choices of q and their theoretical 
properties will be valuable. To this end, higher-order asymptotic treatment 
of the distribution (or moments) of the MLgE will be helpful. For instance, 
derivation of saddle-point approximations of order ra -3 / 2 , along the lines of 
Field and Ronchetti [13] and Daniels [10], may be profitably used to give 
improved approximations of the MSE. 

APPENDIX A: PROOFS 

In all of the following proofs we denote ip n (8) '■= n~ l Y^=i ^ oL qn (f(Xi] 9)). 
For exponential families, since f(x;9) = e eTb ( x )-M e ) ; we have 

i n 

(A.l) VnW = -Y e^WW-WHbiXi) - m(0)), 

i=i 

where m(9) = VeA(9). The MLq equation sets tp n (9) = and solves for 
9. Moreover, we define <p(x,8) := 9 T b(x) - A(9), and thus f(x;9) = e ^ x ' 8 \ 
When clear from the context, tp(x,9) is denoted by (p. 

Proof of Theorem 3.1. Define ip(0) := Eg Vg\og(f(X;9)). Since / has 
the form in (3.1), we can write ip(9) = Eg [b(X) — m(9)]. We want to show 
uniform convergence of ip n (0) to ip{9) for all 9 G O in probability. Clearly, 

1 n 

i eti-fc)^)-^)) {b{Xi ) - m(9)) - E 9o [6(A) - m{9)\ 
n i=i i 



sup 
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(A.2) 



< sup 
6»ee 



+ sup 

flee 



i n 

I£( e (i-fc)W)-A(«)) _ 1)( 6 (X0 -m(0)) 

i=i 

- £(&(**) - m(0)) - E 0O [b(X) - m(0)] 



1=1 



where || • ||i denotes the ^i-norm. Note that the second summand in (A.2) ac- 
tually does not depend on 9 [as m(9) cancels out] and it converges to zero in 
probability by the law of large numbers. Next, let s(Xi] 9) := e <A ~ qn ^ 6lh< ^ x ^~ A ^ 
1 and t(Xf,6) := b(Xi) —m{9). By the Cauchy-Schwarz inequality, the first 
summand in (A.2) is upper bounded by 



(A.3) 



sup< 



'A 



1 n 

-£^;0) 2 



i=l 



n 



i=l 



where tj denotes the jth element of the vector t(Xi;9). It follows that 

for (A.2), it suffices to show n~ l ^ sup^ s(Xf, 9) 2 A and n _1 ^ sup e tj(Xf, 9) 2 
is bounded in probability. Since © is compact, sup e |m(0)| < (c±, c±, . . . , c\) 
for some positive constant c\ < oo, and we have 



n n 

(A.4) - Vsup^pf^) 2 <-V^(I ! ) 2 + 2(c 1 ) 2 , 



i=l 



where the last inequality from the basic fact that (a — b) 2 < 2a 2 + 2b 2 
(a, 6gl). The last expression in (A.4) is bounded in probability by some 
constant and Eg bj(X) 2 < oo for all j = 1, . . . ,p. Next, note that 



_. n i n 

sup -V S (X !; 0) 2 <-V sup e 2 * 1 "*^ 



6(Xi)-A(0)) 



(A.5) 



i=l 



V inf e (i-<?n)(e T b(^)-^(e)) + j 



Thus, to show n 1 ^ sup e s(Xf, 9) 2 A 0, it suffices to obtain n x x 
^supee 2 ^'^'') - 1 4 and n _1 £V inf e* 1- *^) - 1 A 0. Actually, 
since G is compact and sup e~ A ^ < C2 for some C2 < oo, 



(A.6) iysupe 2 ^)^ 6 ^)-^ <-Ve 



2|l-< ?n |(|log C2 |+9(*) T |6(X i )|) 



i=l 



where = max{|^* ; |, |#J7l}' J = 1 , • • • , P and (#j* ^} *i) represent elemen- 
twise boundary points of 9j. For r = 1,2, 



i(*)i 



,(*) fl (*), 



(A.7) ^ [e 



2|l-g„|(|logc 2 |+e(*) T |6(X)|) 1 r 
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(A.8) =e 2r|l-< ?n ||log C2 |-A(0 o ) f e [2r\l-q n \sign{b(x)}eM+e ]?b(x) d ^ x y 



We decompose £1 into 2 P subsets in terms of the sign of the elements of b{x). 
That is, Q = Ufc=i w here 

B 1 = {x G n : h(x) > 0, b 2 (x) > 0, . . . , > 0, b p (x) > 0}, 

(A.9) B 2 = {x e n : bi(x) >0,b 2 (x)>0,..., b p - X (x) > 0, b p (x) < 0}, 

£ 3 = {x e n : bi(x) >0,b 2 (x)>0,..., b p -x(x) < 0, b p (x) > 0} 

and so on. Note that sign{6(x)} stays the same for each Bi, i = 1, . . . , 2 P . Also 
because 6q is an interior point, when |1 — q n \ is small enough, the integral 
in (A. 7) on Bi is finite and by dominated convergence theorem, 

(A.10) [ ePHi-anlsignW^ieW+eo]^)^^)™^ f e W d/i ( x ). 

■>B k JB k 

Consequently, 

J e [2r|l- g , l |sign { 6(x)} 9 W +eo ] T 6(x) ^ ^ J e W*)-M*>) d ^ x ) = L 

(All) 

It follows that the mean and the variance of sup^e 2 ^ 1 in)[6 T b(x) A(e)} CQn _ 
verge to 1 and 0, respectively, as n— > oo. Therefore, a straightforward ap- 
plication of Chebyshev's inequality gives 



(A.12) I^ supe 2(i-^)[^(x l )-A W ] 4 ^ 



n —> oo. 



An analogous argument shows that 

(A.13) - Y inf e d-^Wb(x,)-A m 4 x n ^ OQ _ 

i=l 

Therefore, we have established n _1 sup e s(Xj; #) 2 A 0. Hence, (A. 2) con- 
verges to zero in probability. By applying Lemma 5.9 on page 46 in [31], 
we know that with probability converging to 1, the solution of the MLq 
equations is unique and it maximizes the MLgE. 

Proof of Theorem 3.2. By Taylor's theorem, there exist a random point 
6, in the line segment between #* and 9 n , such that with probability con- 
verging to one we have 

= Vn(X;£n) 

(A-14) _ _ _ _ 

= V<n(X; e*) + ^ n (X; e*)(e n - o* n ) + ±(0„ - e* n ) T M^ o)(e n - e* n ), 
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where ip n is a p x p matrix of first-order derivatives and, similarly to page 
68 in van der Vaart [31], ip n denotes a p- vector of (p x p) matrices of second- 
order derivatives, respectively; X denotes the data vector. We can rewrite 
the above expression as 

-^(#;rVn(X;C) 

(A-15) 

= ^(0"Vn(X;0^n-^) 



(A.i6) + ^ e * n )-t^(e n - e* n YMX;e)(e n - e* n ), 

where ip(9) = E eo V 2 d L qn f(X;9). Note that 

(A.17) m = - q n )V e <p(e) T Ve<p(e) - V 2 e M0)} 

(A.18) = K 1>n E^ n [(1 - q n )Ve<p(0) T V e <p(9) - Vf^(0)], 

where fi kjn = k(l - q n )0 + 9 and K kjTl = e ^(M«,fc)-^o). For k, I £ {1, . . . ,p}, 
we have 



(A.19) 



= E^ A [(b k (X) - m k (9))(k(X) - m,(0))] 

(A.20) = E^ n l [(b k (X) - m fe (/i„,i) + m fe (/%,i) - m k {9)) 

(A.21) x (fy(A) - mj(Mn,i) + m*(Mn,i) - mi(5))] 

(A.22) = £^ [(6 fe (X) - m fe (/vi))(&/P0 " M/Vi))] 

(A.23) + [(m fc (/i ni i) - m k (9))(mi({j, n> i) - mi(9))}, 

where the first term in the last passage is the kith element of the covariance 
matrix —D{9) evaluated at (JL n ,i- Since is compact, {?p(9)} k i < < oo, 
for some constants C£ z , k, I £ {1, . . . ,p}. We take the following steps to derive 
asymptotic normality. 

Step 1. We first show that the left-hand side of (A. 16) converges in dis- 
tribution. Define the vector Z Uji := V e L qn f{Xi,9* n ) - Ee V d L qn f(Xi,9^) in 
W. Consider an arbitrary vector a6l p and let W n ^ := a T Z n ^ and W n = 
J2i Wri,i- Since W n ^ (1 < i < n) form a triangular array where W n> i are 
rowwise i.i.d., we check the Lyapunov condition. In our case, the condition 
reads 

(A.24) n- 1 / 3 (EW^ 1 ) _1 ( J B[W^ 1 ]) 2/3 ^0 asn^co. 

Next, denote fi n)k = 9o + k(l — q n )9n- One can see that 

3\ 2/3 



(E[Wl l })^ = K n [E i 



j'=i 
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where K n = exp{-§A(# ) - 2(1 - q n )A(6* n ) + |A(/i„ i3 )} and K n -»• 1 as 
n — )• oo. Since #o is an interior point in O (compact) the above quantity 
is uniformly upper bounded in n by some finite constant. Next, consider 

E[Wl 1 ]=E[a T Z nyl Zl l )=a T E[Z nyl Zl 1 ]a. 

A calculation similar to that in (A. 23) for the matrix Zn^Z^ 1 shows that 
the above quantity satisfies 

(A.25) a T [-L>(/x nj2 ) + M n ]a^ -a T D(9 )a > 0, n-^oo, 

where the kith element of M n is 

(A.26) {M n } kl = (m fe (/V2) - m k {e* n )){mi{n n>2 ) - mj(0 

and fi n ^ — > 0q and — > 6o, as n — > oo. This shows that condition (A. 24) 

holds and y /n(E\W% i ^-^aWn 4 iVi(0, 1). Hence, by the Cramer-Wold 
device (e.g., see [31]), we have 

(A.27) ^[£Zn,i<i]~ 1/2 W/n 4 iV p (0,I p ). 

Siep 2. Next, we want convergence in probability of V>(0*) -1 ^ n (X, <?n) to 
I p . For A;, / £ {1, . . . ,p}, given e > 0, we have 

^o(l{i(X,«}M-{#;)}M|>£) 

(A.28) 

a 2 



< n^e^ 2 E do 



dm 



by the i.i.d. assumption and Chebyshev's inequality. When |1 — q n \ < 1, the 
expectation in (A.28) is 

£, o[e 2(i- 3 nM^) [(1 _ qn) (b k (x) - m k {e* n ))(h{x) - mi {e* n )) + D(e* n ) 2 ]] 2 

< 2E^ 2 [(b k (X) - m k (9* n ))(bi{X) - m{6* n )f + D(6* n f] 

x exp{-A(0 o ) - 2(1 - q n )A(e* n ) + A(^ 2 )}, 

where the inequality passage follows from the triangle inequality. Since O 
is compact and the existence of fourth moments is ensured for exponen- 
tial families, the above quantity is upper bounded by some finite constant. 
Therefore, the right-hand side of (A.28) is upper bounded by a constant 
that converges to zero as n — > oo. Since convergence in probability holds 
for each k,l G {l,...,p} and p < oo, we have that the matrix difference 
IVv^X, #*) — ip(6n)\ converges in probability to the zero matrix. From the 
calculation carried out in (A. 17), one can see that VK^n) is a deterministic 
sequence such that ip(@n) ~^ V'(^o) = ~~ VqA(9o), as n — > oo. Thus, we have 

(A.29) \MK;0*n) - Wo)\ < \MX; 9* n ) - <KO + 1«) - «)l A o 
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as n — > oo. Therefore, V>(0n) _1 ^ n (X, #n) V 

S'tep 3. Here, we show that the second term on the right-hand side of 
(A. 16) is negligible. Let g(X;9) be an element of the array #) of di- 

mension p x p x p. For some fixed 9 in the line segment between 9 and 
we have that 

(A.30) |5(X;^)-9(X;C)l = |v^(x,?) T ||e-CI<su P |v e5 (x,0)||e-^|. 

flee 

A calculation shows that the hth. element of the gradient vector in the ex- 
pression above is 



{V e <?(X,#)}/ 



(A.31) 



n 



-i£ e a-*o*»M[ ( i _ + (i -?n)V(0) (2) 

-f (l-g„)^)(3)+^)(4)] 



i=l 



for /i G {1, . . . where ip^ k ' denotes the product of the partial derivatives 
of order k with respect to 9. As shown before in the proof of Theorem 3.1, 
supfl e'- 1 " 9 "^^' 9 ) has finite expectation when 1 1 — q n \ is small enough. Thus, 
by Markov's inequality, sup e |</(X, #)| is bounded in probability. In addition, 
recall that the deterministic sequence ip(9^) converges to a constant. Hence, 
4'(9n)~ 1 ' l l J nQ±; 9q) is bounded in probability. 

Since the third term in the expansion (A. 16) is of higher order than the 
second term, by combining steps 1, 2 and 3 and applying Slutsky's lemma 
we obtain the desired asymptotic normality result. 

Proof of Theorem 4.1. Uniform convergence of ip n {9) to ip(9) for all 9 E 
in probability is satisfied if 



sup 

6*60 



0. 



1 n 

- fiX^f-^UiXi-ff) - E 9o U(X 

i=l 

The left-hand side of the above expression is upper bounded by 
1 - 

-Y.if^^-imx^t 

n — ^ 



(A.32) 



sup 



i=i 



+ sup 

6»ee 



1 n 

-Y J U{X l] 9)-E do U(X, 
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By the Cauchy-Schwarz inequality, the first term of the above expression is 
upper bounded by 



sup' 
eee 



j=l \ i=l 



1 71 



t=i j 



By assumption B.2, n 1 sup Uj(Xi;9) 2 is bounded in probability. More- 
over, given e > 0, by Markov's inequality we have 

pfn^yW/pr^) 1 -** - l) 2 > < e^supC^*; fl) 1 " 9 ™ - if, 

which converges to zero by assumption B.2. By assumption B.3, the second 
summand in (A. 32) converges to zero in probability. 

Proof of Theorem 4.3. By Taylor's theorem, for a solution of the MLg 
equation, there exists a random point 6 between 9 n and such that 



-i n i n 

= -y2u*(X i ,e* n ,q n ) + -y2v e U*(X u 9* n ,q n )(9 : 
n t—' n t—' 



(A.33) 



j=i 



i=l 



C) T -EV^*(Xi,0,g n )(0„-C)- 



i=l 



From Theorem 4.1, we know that with probability approaching 1, 9 n is the 
unique MLgE and the above equation holds. Define Z n ^ := U* (Xf, q n ) , 
i = 1, . . . ,n, a triangular array of i.i.d. random vectors and let a S W be a 
vector of constants. Let W n> i := a 1 Z n ^. The Lyapunov condition for ensuring 
asymptotic normality of the linear combination a 1 Y^i=i Zn,i/n for a E R p 
and II all > in this case reads 



n 



-^(EWlj-^ElWl^^O as rwoo. 



Under C.l and C.2, this can be easily checked. The Cramer-Wold device 
implies 

1 ™ 

Cn-J2 U *( X ^ 6 n^n)^N p (0,I p ), 



i=l 



where C n := ^i[Ee U* {X,6* n fU* {XMY 1 ' 2 . 

Next, consider the second term in (A.33). Given e > 0, for k, I E {1, . . . ,p}, 
by Chebyshev's inequality 



P 



n 



) - Un}k,l 



i=l 



> e < e -\r 2 i?{r(x,Csn)}L- 
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Thus, the right-hand side of the above expression converges to zero as n — > oo 
under C.3. Since convergence in probability is ensured for each k,l and 
p < oo, under C.2, we have that {n -1 I* (Xi,0^) — J n \ converges to the 
zero matrix in probability. 

Finally, n~ 1 V 2 e Ya=i U* {Xi,0, q n ) in the third term of the expansion (A. 33) 
is a p x p x p array of partial second-order derivatives. By assumption, there 
is a neighborhood B of 0q f° r which each entry of S/'^ j U*{x,0,q n ) is domi- 
nated by go(x) for some go(%) > for all £ B. With probability tending to 
1, 



n 
i=l 



i=l 



which is bounded in probability by the law of large numbers. Since the third 
term in the expansion (A. 33) is of higher order than the second term, the 
normality result follows by applying Slutsky's lemma. 

Proof of Theorem 5.1. From the second-order Taylor expansion of a(x n ;0 n ) 
about 6* one can obtain 



(a(x n ;0 n ) - a(x n ;0^)) 
a n a'(x n ;0*) 



(A.34) 



9 n -^) + ^^) v - ( ~_^ )2 



n • n n> + 



2o n a!(x n ;9*) 

1 a"(x n ;6* n ) a"(x n ;9) 
2a n a'(x n ;9*) a"(x n ;9* n ) 



*\2 



where 9 is a value between 9 n and 0* . We need to show that the second term 
in (A.34) converges to zero in probability, that is, 



(A.35) 



a"(x n ;9* n ) a"(x n ;9) a n n(0 n - 0* n ) 2 P 



0. 



at 



a'(x n ;9*) a"{x n ;9* n ) ^fn 
Since y/n(9 n — #*)/<7 n — > N(0, 1) and a n is upper bounded, we need 

a"(x n ;9*) a"(x n ;0) p 



(A.36) 



a' (x n ; 0* ) ^ph a" (x n ; 0* ) 



0. 



This holds under the assumptions of the theorem. This completes the proof 
of the theorem. 
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Proof of Theorem 5.2. The rationale presented here is analogous to that 
of Theorem 5.1. From the second-order Taylor expansion of p(9 n ,s) about 
9* one can obtain 



p(s n ;O n ) - p(s n ;6* 



(A.37) 



0"nP'(Sn;l 



n- 



1 p"(s n ;6) ~ 2 



where 9 is a value between 9 n and #* . The assumptions combined with The- 
orem 3.2 imply that the second term in (A.37) converges to in probability. 
Hence, the central limit theorem follows from Slutsky's lemma. 

APPENDIX B: MULTIVARIATE NORMAL N P (p,T,). ASYMPTOTIC 

DISTRIBUTION OF THE MLQE OF S 

The log-likelihood function of a multivariate normal is 



(B.l) 



log/(x;/Lt,E) 



-|(27T)-il0g]S| 



^(x-/x) T I](x-/i). 



Recall that the surrogate parameter is 9* = (fi 1 , qvech 1 5]) T . The asymptotic 
variance is computed as V = J" 1 (9*) K (6*) J" 1 (9*), where 



(B.2) 
(B.3) 
and 
(B.4) 
(B.5) 



K(9*) = E 6o [/(x; 9*f l ^U{^ 9*) T U(x; 9*)] 
= c 2 EW[U( X ;9*) T U(x;9*)] 



J(9*) 



qE eQ [f{^9*) l -"U{^9*yU{^9*)} 

q Cl E^[U(^9*yu(^9*)l 

where E^ r ' denotes expectation taken with respect to a normal with mean 
H and covariance matrix [r(l — q) + r = 1,2, and the normalizing 

constant c r is 

(B.6) c r :=ifyr/(x;0*) r(1 - ff) ] 



(27r) r P( 1 -9)/2 |gS |r(l-5)/2 ( 2vr ) 1/2 1 £ | 1/2 



(B.7) 



(r(l - g) + I)"?/ 2 



(2vr^|S|) r '( 1 -9)/2 • 
Note that K and J can be partitioned into block form 



(Bi 



Xi 


^12" 


J = 


~Jll 


Jl2 


.^21 


^22 _ 


J21 


J22 
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where K\\ and Jn depend on second-order derivatives of U with respect to 
fi, K22 and J22 depend on second-order derivatives with respect to vechS. 
The off-diagonal matrices K12, K21 depend on mixed derivatives of U with 
respect /x and vech T S. Since the mixed moments of order three are zero, 
one can check that K21 = Kj 2 = 0. Consequently, only the calculation of 
Kn, K22, J11 and J22 is required and the expression of the asymptotic vari- 
ance is given by 



(B.9) 



V 





" 







^22 _ 





J u K n J u 









- J 22 1 ^22«/ 2 2 1 



Next, we compute the entries of K and J using the approach employed by 
McCulloch [25] for the usual log-likelihood function. First, we use standard 
matrix differentiation to compute K\\ and Jn, 



(B.10) 
(B.ll) 



K U = cuE® [(g5])- 1 (x - /z) T (x - /x)^)" 1 ] 



C2Q 



[2(1 -gHl]-^- 1 



and similarly one can obtain Jn 
forward algebra gives 



(B.12) 



Vi 



11 



Ju K uJii 



c\q 1 [(1 — ?) + l] X S 1 . Some straight- 
(2 



(3 - 2q) 1 +P/ 2 



E. 



Next, we compute V22- Let z := 5] 1 ^ 2 (x — /1) using the following relationship 
derived by McCulloch ([25], page 682): 

£[V vcchS £(#)] T [V vcchS ^)] 
(B.13) = 1/4G T (5T 1/2 ® 5T 1/2 )(£[(z ® z)(z T ® z T )] - vecI p vec T I p ) 

x(S- 1 / 2 ®E- 1 /2) G . 

Moreover, a result by Magnus and Neudecker ([24], page 388) shows 

(B.14) £[(z<g>z)(z T <g>z T )] = l p + K p>p + vecI p vec T I p , 

where K P:P denotes the commutation matrix (see Magnus and Neudecker 
[24]). To compute K22 and J22, we need to evaluate (B.13) at 6* = (/x T , q vech T S) T , 
replacing the expectation operator with c r E^[-]. In particular, 



(B.15) 



{£ (r) [(z ® z) (z T ® z T )] , - vec I p vec T I p }G 
= (r(l- (? ) + l)^ 2 {I p + K PiP }G 
= 2(r(l-g) + l)" 2 G, 
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where the last equality follows from the fact that K PjP G = G. Therefore, 
(B.16) K 22 = l/(V)c 2 G T (I]- 1 / 2 5T 1/2 ) 

(B.17) x (£[(z®z)(z T ®z T )] -vecI p vec T Ip)(£- 1/2 ®5r 1/2 )G 

(B.18) = l/(4g 2 )c 2 [(r(l - q) + l)" 2 + ljG^S" 1 ST^G 

A similar calculation gives 

Finally, we assemble (B.19) and (B.20) obtaining 

^22 = J 22^^22 

(B ' 21) - V[(3-2g) 2 + l](2- g )^ T , u x 

~ [(2 _ g) 2 + 1] 2 (3 _ 2(?) 2 +P /2i G V ®^ M • 
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